Skip to content

Fix large-message initialization in communication benchmark#1016

Open
pentschev wants to merge 2 commits intorapidsai:mainfrom
pentschev:fix-bench-comm-message-limit
Open

Fix large-message initialization in communication benchmark#1016
pentschev wants to merge 2 commits intorapidsai:mainfrom
pentschev:fix-bench-comm-message-limit

Conversation

@pentschev
Copy link
Copy Markdown
Member

bench_comm could fail while initializing input buffers for per-peer messages at around 8 GiB and larger, before the benchmark itself ran. This change allows large-message all-to-all cases to run correctly.

`bench_comm` could fail while initializing input buffers for per-peer
messages at around 8 GiB and larger, before the benchmark itself ran.
This change allows large-message all-to-all cases to run correctly.
@pentschev pentschev self-assigned this May 6, 2026
@pentschev pentschev requested a review from a team as a code owner May 6, 2026 21:04
@pentschev pentschev added bug Something isn't working non-breaking Introduces a non-breaking change labels May 6, 2026
Comment thread cpp/benchmarks/utils/random_data.cu Outdated
Comment on lines +31 to +34
RAPIDSMPF_EXPECTS(
nelem <= static_cast<std::size_t>(std::numeric_limits<index_t>::max()),
"random_device_vector size exceeds signed iterator range"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Do we consider 32-bit systems? For 64 bit nelem > INT64_MAX seems unlikely isnt it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use safe_cast()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Do we consider 32-bit systems?

No: https://forums.developer.nvidia.com/t/whats-the-last-version-of-the-cuda-toolkit-to-support-32-bit-applications/323106/4

For 64 bit nelem > INT64_MAX seems unlikely isnt it?

Removed that.

Always use safe_cast()

Done.

}

std::unique_ptr<cudf::column> random_column(
cudf::size_type nrows,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this be std::size_t now to reflect the above change? Otherwise we will be clipping values above UINT32_MAX, isnt it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't make a column with more than size_type::max rows, so I think the point is moot?

Copy link
Copy Markdown
Member Author

@pentschev pentschev May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this be std::size_t now to reflect the above change? Otherwise we will be clipping values above UINT32_MAX, isnt it?

Not really, for cuDF columns/rows we should still use cudf::size_type, but that shouldn't limit us from generating/filling larger vectors (such as those we need for bench_comm that are completely unaware of cuDF).

Comment thread cpp/benchmarks/utils/random_data.cu Outdated
Comment on lines +75 to +90
buffer.size / sizeof(std::int32_t) + sizeof(std::int32_t),
(buffer.size + sizeof(std::int32_t) - 1) / sizeof(std::int32_t),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if buffer is empty now, we will be passing 0 to random_device_vector. I think random_fill was trying to prevent that before? I'm not sure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now.


cudf::table random_table(
cudf::size_type ncolumns,
cudf::size_type nrows,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #1016 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants